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1.  Introduction 


Inverse  molecular  design  (IMD)  beyond  the  purview  of  drug  design  has  enjoyed 
increasing  popularity  recently.  Examples  can  be  found  in  protein  design1"4  or 
high-hyperpolarizability  materials.5"7  The  design  problem  is  complicated  by  the 
vastness  of  possible  chemicals,  termed  chemical  space.  This  space  can  be  viewed 
as  combinatorially  complex  (e.g.,  208  ~  2.6  •  1010  octapeptides  of  the  naturally 
occurring  amino  acids  exist  alone).  As  a  consequence,  a  variety  of  methods  have 
been  developed  for  the  discrete  optimization  in  chemical  subspaces.8"15  The 
continuous  optimization  of  chemicals  used  in  the  linear-combination-of-atomic- 
potentials  (LCAP)  method1617  and  the  variation-of-particles  density-functional- 
theoretical  (VP-DFT)  method  introduces  an  important  concept  for  dealing  with  the 
inherent  roughness  of  chemical  compound  space.18"20  LCAP  and  VP-DFT 
interpolate  continuously  between  the  Hamiltonians  of  various  chemical  species. 
Furthermore,  an  investigation  into  the  reasons  why  chemical  optimization  is, 
relatively  speaking,  “easy”  recently  utilized  probability  distributions  and 
expectations  on  the  control  variables  to  arrive  at  its  conclusion.21,22  Follow-up 
work  discovered  that  indeed  all  properties  can  be  optimized  efficiently  using  only 
the  charge  density  with  as-yet  unknown  functionals.23  All  previous  optimizations 
were  executed  using  at  most  single  constraints.  In  this  contribution,  we  introduce  a 
flexible  method  for  multiple  nonlinear  (inequality)  constraints  applied  to  a 
combinatorially  complex  search  space  of  energetic  materials. 

In  recent  years,  organic  molecules  have  garnered  increasing  attention  as 
components  of  high-hyperpolarizability  materials,  partly  due  to  the  variety  of 
synthetically  accessible  compounds,  cost,  and  ease  of  processing.24,25  Applications 
for  materials  with  high  hyperpolarizabilities  are  found  in  telecommunication  and 
optics.26  The  dominant  nonlinear  response  of  organic  molecules  often  finds  its 
origin  in  the  conjugated  7r-system,  which  facilitates  the  electronic  polarizability. 
This  lends  itself  to  advanced  electronic  and  photonic  applications  including  optical 
information  processing,  photovoltaic  cells,  photodynamic  therapy  agents,  and 
many  others.26"28  The  design  of  such  molecules  in  silico  is  complicated  by  the  fact 
that  chemical  space,  even  constrained  to  smaller  organic  compounds,  is 
combinatorially  complex.  The  number  of  organic  molecules  of  medium  size  is 
estimated  to  be  on  the  order  of  lO200.29  Enumeration  is  therefore  unfeasibly  costly 
and  other  methods  for  property  optimization  need  to  be  developed.  Including 
conformational  searching  further  complicates  molecular  design. 
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The  optical  spectra  and  nonlinear  optical  (NLO)  properties  of  chromophores  may 
be  conveniently  manipulated  by  modifying  the  molecular  architecture, 
substitutional  groups,  or  substitution  patterns. 26,30-32  The  hyperpolarizability  tensor 
are  the  third  derivatives  of  the  energy  with  respect  to  the  electric  field,33,34 

d3E 

^ijk  =  dFidFjdFk'  (1) 

Commonly  only  the  averaged  vector  components  of  the  vector, 

A  =  |  Ej  Pijj  +  Pjij  +  Pjji  are  used.  In  particular  the  component  in  the  direction 
of  the  dipole  moment,  Ax  =  jj^jj  Yhi=x  y  z  A/T*  and  the  vector  norm, 

A2  =  Ei.=X,y,  z  df,  play  an  important  role  in  electro-optic  activity. 

Several  equivalent  methods  predict  electro-optic  trends  correctly,27  despite  the 
abundance  of  methods  and  conventions  for  determining  hyperpolarizabilities 
experimentally.26,35  The  sum-over-states  expression  of  the  tensor  components  in 
the  limit  of  infinite  wavelength, 


a jk  —  'y  ^ 


(0|xj|/r)(/r|xj|i/)(i/|xfc|0) 


F,v 


(2) 


where  Eg  is  the  excitation  energy  from  the  ground  state  to  excited  state  a,  links 
the  linear  absorption  spectrum  to  the  hyperpolarizability.36  From  Eq.  2,  low 
excitation  energies  coupled  with  large  ground-to-excited-state 
transition-dipole-moments  lead  to  large  hyperpolarizabilities  (i.e.,  a  material 
which  absorbs  everywhere  exhibits  maximal  hyperpolarizability).  But  such  a 
material  would  not  be  able  to  transmit  anything.  Hence,  it  is  necessary  to  navigate 
the  trade-off  between  hyperpolarizability  and  optical  constraints  by  tuning  the 
excited-to-excited-state  transition-dipole-moments  while  maintaining  transparency 
or  absorption  at  the  target  wavelength(s). 


Therefore,  the  main  challenge  to  designing  efficient,  electro-optic  (EO)  materials 
lies  in  retaining  good  linear  optical  properties.  For  instance,  transparent 
chromophores  in  a  visible  yellow  spectral  range  are  typically  small  molecules  with 
low  EO  response,  while  molecules  possessing  large  hyperpolarizabilities  are 
frequently  opaque  or,  at  best,  have  small  windows  of  visibility.  The  visibility 
window  in  these  NLO  frameworks  is  generally  bracketed  by  2  kinds  of  electronic 
transitions.28,30,31  The  blue  transition  is  a  local  excitation  of  tt-tt*  character,  while 
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the  red  transition  is  characterized  by  a  significant  intramolecular  charge-transfer. 
The  charge-transfer  peak  is  typically  the  maximum  absorption  peak  (A max)  while 
the  7T-7T*  transition  has  a  lower  oscillator  strength  and  is  often  denoted  as  \max-  i  ■ 
Optimal  chromophores,  which  are  transparent  in  the  visible  spectrum,  would  have 
a  red  Amor-pcak,  while  the  \max-  i  -peak  should  lie  in  the  blue  or  UV  spectral 
region  to  open  a  large  visibility  window.  In  the  case  of  telecommunications 
applications,  it  is  important  to  achieve  transparency  in  the  1.30-1.55  |am  minimum 
range.37  Therefore,  the  design  principle  for  such  chromophores  calls  for  the 
Am„.r-pcak  to  be  significantly  blue-shifted.  In  this  report,  we  investigate  the  effects 
due  to  the  electronic  properties  of  the  7r-system  of  the  bridge  between  donor  and 
acceptor. 

Computational  tools  are  increasingly  playing  a  role  in  assisting  experimental 
design  of  optimal  chromophores  by  curtailing  the  vast  optimization  space  of 
chromophore  structures. 14,27-38^3  Density-functional  theory  currently  offers  the 
best  compromise  between  accuracy  and  computational  performance  for  typical 
chromophores  of  about  100  atoms.  The  electronic  spectra  are  often  calculated 
using  time-dependent  density-functional  theory  (TD-DFT)44  and  numerous 
successes  of  this  technique  have  been  recently  reviewed.45 

2.  Optimization  Methodology 

Our  optimization  problem  is  formulated  as  a  constrained  maximization, 


max  P(x) 
xeccs 


_  re\x)/ 400 

s.t.  :  7 r(x)  =  /  f^x)e~ff(3,~1)ady  <  30, 

^  Jdx)n  00 


ei  V 700 


(3) 


where  tt(x)  is  the  penalty  due  to  the  absorbance  of  compound  x  from  the  chosen 
subspace  (CCS)  of  chemical  compound  space  discussed  later  in  the  range 
400-700  nm,  e-'^  and  fjx)  are  the  excitation  wavelength  and  the  normalized 
oscillator  strength  of  state  i  for  compound  x,  respectively,  a  =  72  is  an 
approximate  broadening  coefficient,  and  P(x)  is  the  hyperpolarizability  /30.  The 
constraint  value  of  30  is  equivalent  to  an  average  absorption  of  0.1  in  the  400-  to 
700-nm  range.  We  reformulate  Problem  3  via  a  non-negative  Lagrange  multiplier 
0  <  A  G  M  for  the  constraint  in  the  augmented  Lagrangian  function 
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C(x,  A)  :=  P(x)  —  Xtt(x)  to 


max  min  P(x)  —  Xn(x).  (4) 

xeCCS  A>0 

The  Problem  4  is  then  solved  altematingly  between  the  primal  variable(s)  x  and 
the  dual  viariables  A. 

We  begin  by  describing  the  optimization  procedure  within  CCS.  The  chemical 
subspace  that  we  investigate  is  depicted  in  Fig.  1.  Each  of  the  8  Xs  on  1  represents 
a  substitution  site  for  which  one  of  4  chemical  groups  is  attached. 
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Fig.  1  Optimization  framework:  Each  X  may  be  replaced  by  -H,  -F,  -Cl,  or  -Br  for  a  total  of 
216  possible  molecules. 


The  overall  search  space  thus  encompasses  216  molecules.  Each  such  substitution 
site  represents  an  independent  search  direction.14  There  is  a  natural  enumeration 
that  follows  from  the  input  (see  Listing  4  in  the  Appendix),  where  the  compound 
{xi  =  Xt}  is  assigned  the  number  n(x)  =  E,  II  ;q  NjXi,  where  Nj  is  the 
number  of  options  on  substitution  site  j  and  N0  =  1. 

The  substitutions  are  enumerated  and  thus  can  be  viewed  as  the  integer  positions 
on  a  circle.  The  optimization  proceeds  with  a  local  line  search  for  each  substitution 
site  in  the  prescribed  order  of  sites.  When  neighboring  substitutions  in  the  current 
direction  to  the  current  iterate  are  inferior,  the  line  search  is  halted  and  the  next 
direction  is  searched. 

After  all  search  directions  have  been  searched,  the  minimization  with  respect  to  A 
is  performed  using  the  dual  function  P^( A)  =  maxxeCcs  £(x,  A)  via  its 
approximation 

p[d)W  =  max .C(x,X)  <  P(d)(  A),  (5) 

where  CCS'  is  the  subset  of  visited  (i.e.,  already  computed)  molecules.  Due  to  the 
discrete  nature  of  CCS',  P'(dj  is  a  piece- wise  linear  function  of  A  with  general 
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derivative  — 7t(x(A))  ,  where  x(X)  :=  arg  iiiax^^y  C(y,  A).  Due  to  the 
piece-wise  linearity,  A  only  changes  meaningfully  when  x(X)  ^  x(A').  Thus, 
whenever  n(x)  ^  0,  A  is  updated  cumulatively  by 

AA  =  aTi(x*)  max  {inf  {A'  G  M+|£(rc,  A*  +  A')  >  jC(x*,  X*  +  A')}  ,  0}  ,  (6) 

where  a  >  1,  CCS"  is  the  set  of  molecules  that  have  already  been  computed,  x*  is 
the  currently  active  molecule,  and  A*  is  the  currently  active  Lagrange  multiplier. 

By  choosing  the  update  along  the  constraint  violation  direction  n(x*),  the  update 
of  A  is  a  steepest  descent  update  with  a  conservative  step  size.  The  step  size  is 
chosen  such  that  either  x*  violates  constraints  the  least  as  well  as  maximizing  P  or 
that  there  exists  x'  G  CCS"  with  an  improved  C(x',  A  +  AA)  >  C(x*,  A  +  AA). 
The  optimization  then  returns  to  optimizing  over  CCS  as  described  before  until  no 
improvement  can  be  attained  or  the  maximum  number  of  steps  is  exceeded.  The 
general  program  flow  is  visualized  in  Fig.  2. 

Since  generally  there  is  no  a  priori  knowledge  of  the  proper  search  order  of  each 
substituent,  the  initial  assignment  is  expected  to  be  generally  unsuited  for  smooth 
optimization.  To  mitigate  this  problem,  the  enumeration  of  substituents  is 
reassessed  after  each  full  cycle  of  local  searches  in  each  direction.  For  each 
substituent  the  average  arc  tangent  of  the  Lagrangian  value  is  computed, 

T.  arct  an  C  (x ,  A) 

cgas,|„.,l  ,  3.  e  CCS'  :  X.  =  j  t  (7) 

aj,  otherwise 

where  a?  is  a  default  policy  for  unseen  substituents  (discussed  on  the  next  page), 
which  is  subsequently  used  to  order  the  substituents  around  the  circle  starting  with 
the  lowest  scoring  substituent  and  placing  substituents  altematingly  to  the  left  and 
right  in  ascending  order.  The  resulting  order  produces  a  smooth  ordering  which  is 
monotonically  increasing  until  the  maximum  is  reached,  and  then  monotonically 
decreasing  until  the  minimum  is  reached,  or  vice  versa  depending  on  which  part  of 
the  circle  one  starts  and  which  direction  one  goes.  The  arc  tangent  is  used  because 
some  computations  may  fail  due  to  convergence  issues  in  geometries  or  general 
instability  of  the  molecule,  which  are  mapped  to  a  Lagrangian  value  of  —  oo. 

We  investigate  4  different  strategies  for  assigning  values  to  unseen  substituents. 
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Fig.  2  Flowchart  of  algorithm 


•  Algorithm  1 :  Complete  a  full  sweep  of  all  substituents  on  the  first  iteration. 
After  the  initial  scan  there  are  no  unseen  substituents  anymore. 

•  Algorithm  2:  Assign  the  unseen  substituent  the  same  value  as  the  current 
iterate;  this  guarantees  that  the  unseen  substituents  are  immediate  neighbors 
of  the  current  iterate  and  that  at  least  one  will  be  seen  in  the  next 
optimization  step. 

•  Algorithm  3:  Assign  the  unseen  substituent  the  value  0.  This  is  easily 
implemented  but  may  have  random  results. 

•  Algorithm  4:  Assign  the  unseen  substituents  the  maximum  value.  This 
implies  a  bias  toward  unseen  substituents  to  be  better  in  general.  As  the 
optimization  proceeds  this  converges  with  the  second  method. 
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No  reordering  is  denoted  as  Algorithm  0.  Four  further  algorithms  are  compared: 


•  Algorithm  5:  Conduct  a  full  line  search  in  each  direction.  No  reordering  is 
necessary,  but  presumably  far  more  compounds  need  to  be  computed  for 
each  direction. 

•  Algorithm  6:  Reordering  occurs  as  per  Algorithm  3,  but  unlike  in  Eq.  6,  A  is 
chosen  such  that  £(A,  x*)  <  £(A,  x'),  where  x'  e  arg  min y^ccs'  n(y). 
Hence,  Algorithm  6  solves  the  subproblem 

max  P(x)  (8) 

xeCCS1  v 

s.t.  :  n(x)  <  30. 

•  Algorithm  7:  Following  the  protocol  of  optimizing  on  a  hypercube.14 

•  Algorithm  8:  Same  as  Algorithm  7,  but  with  deleting  compounds  x  with 

7 t(x)  >  tt(x*)  from  the  library,  effectively  reordering  the  library,  where  x*  is 
the  current  iterate. 

Since  the  latter  2  methods  are  unaware  of  the  underlying  structure  of  the  search 
space,  they  may  be  expected  to  perform  erratically. 

3.  Computational  Chemistry  Protocol 

All  geometries  were  optimized  with  PM6  using  Gaussian  0946  as  described 
elsewhere47  (see  Listing  1  in  the  Appendix  for  specifics).  Hyperpolarizabilities  (3^ 
were  computed  at  the  CNDO  level  of  theory.14  Spectra  were  computed  at  the 
BH&HLYP/6-31+G(d)  level  in  the  time-dependent  density-functional 
framework48-49  using  Gaussian  09.46 

Before  further  analysis,  the  conformational  space  was  explored  to  ensure  only 
low-energy  species  were  considered.47  Listing  2  in  the  Appendix  summarizes 
these  steps. 
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Br  Cl 


Fig.  3  Best  candidate  found 


4.  Results  and  Discussion 

Despite  expectations  Algorithm  5  does  not  always  perform  better  at  reducing  the 
violation  than  the  other  algorithms  (see  start  number  41668  in  Table  1).  As 
expected,  the  Algorithm  7  variants  tend  to  perform  worse  than  the  Algorithm  5 
variants.  Surprisingly,  the  Algorithm  8  actually  performs  worse  on  average  than 
Algorithm  7.  Algorithm  6  uses  the  fewest  iterations  on  average  per  individual  run. 
The  largest  number  of  computed  compounds  in  a  single  run  was  138,  which 
constitutes  merely  0.02%  of  the  library.  Despite  minimizing  n.  Algorithm  6  results 
in  the  lowest  penalty  violation  once  (see  start  number  41668  in  Table  1).  Except 
for  start  number  8389,  Algorithm  0  is  inferior  with  respect  to  fulfilling  constraints 
to  all  other  methods  but  Algorithm  8,  which  fails  to  outperform  Algorithm  0  for 
start  number  0  only. 

Table  1  Comparison  of  the  performance  of  the  assorted  reordering  schemes  on  multiple  start¬ 
ing  compounds 


Method 

Start 

#  Molecules 

Property 

Penalty 

Efficiency 

Algorithm  5 

0 

119 

121.31 

15.5295 

0.000541122 

Algorithm  0 

85 

133.88 

23.2389 

0.000506251 

Algorithm  1 

81 

124.352 

21.5181 

0.000573735 

Algorithm  2 

74 

133.88 

23.2389 

0.000581504 

Algorithm  3 

74 

124.352 

21.5181 

0.000628007 

Algorithm  4 

93 

126.962 

18.8063 

0.00057176 

Algorithm  6 

49 

124.352 

21.5181 

0.000948418 

Algorithm  7 

64 

132.119 

23.5097 

0.000664619 

Algorithm  8 

80 

130.764 

27.0287 

0.000462471 

Algorithm  5 

4710 

114 

121.31 

15.5295 

0.000564856 
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Table  1  Comparison  of  the  performance  of  the  assorted  reordering  schemes  on  multiple  start¬ 
ing  compounds  (continued) 


Method 

Start 

#  Molecules 

Property 

Penalty 

Efficiency 

Algorithm  0 

45 

164.923 

38.6927 

0.000574326 

Algorithm  1 

84 

139.494 

21.5761 

0.000551757 

Algorithm  2 

91 

121.31 

15.5295 

0.000707622 

Algorithm  3 

93 

121.31 

15.5295 

0.000692404 

Algorithm  4 

93 

121.31 

15.5295 

0.000692404 

Algorithm  6 

76 

125.011 

17.7595 

0.000740893 

Algorithm  7 

55 

125.011 

17.7595 

0.00102378 

Algorithm  8 

48 

163.287 

35.6982 

0.000583596 

Algorithm  5 

8389 

120 

105.054 

3.5658 

0.002337016 

Algorithm  0 

111 

105.054 

3.5658 

0.002526504 

Algorithm  1 

118 

105.054 

3.5658 

0.002376627 

Algorithm  2 

139 

105.339 

8.68666 

0.000828195 

Algorithm  3 

139 

105.333 

8.68666 

0.000828195 

Algorithm  4 

128 

125.799 

18.8994 

0.000413373 

Algorithm  6 

63 

93.8483 

5.66247 

0.002803196 

Algorithm  7 

64 

104.418 

8.92695 

0.001750318 

Algorithm  8 

81 

105.333 

8.68666 

0.001421223 

Algorithm  5 

41329 

142 

105.054 

3.5658 

0.001974943 

Algorithm  0 

75 

164.923 

38.6927 

0.000344596 

Algorithm  1 

111 

112.876 

11.5523 

0.000779845 

Algorithm  2 

111 

111.944 

7.44892 

0.001209438 

Algorithm  3 

112 

105.054 

3.5658 

0.002503946 

Algorithm  4 

111 

111.944 

7.44892 

0.001209438 

Algorithm  6 

85 

96.982 

5.35516 

0.002196892 

Algorithm  7 

77 

129.505 

23.4977 

0.000552693 

Algorithm  8 

97 

125.799 

18.8994 

0.000545482 

Algorithm  5 

41668 

107 

115.973 

12.729 

0.000734213 

Algorithm  0 

88 

132.798 

22.3781 

0.000507802 

Algorithm  1 

104 

93.8483 

5.66247 

0.00169809 

Algorithm  2 

137 

126.962 

18.8063 

0.000388129 

Algorithm  3 

87 

115.973 

12.729 

0.000902997 

Algorithm  4 

138 

126.962 

18.8063 

0.000385316 
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Table  1  Comparison  of  the  performance  of  the  assorted  reordering  schemes  on  multiple  start¬ 
ing  compounds  (continued) 


Method 

Start  #  Molecules 

Property 

Penalty 

Efficiency 

Algorithm  6 

101 

105.054 

3.5658 

0.002776653 

Algorithm  7 

87 

118.986 

13.5325 

0.000849381 

Algorithm  8 

112 

105.333 

8.68666 

0.001037109 

Table  2  Ratio  of  objective  to  constraint,  rounded  to  uniquely  identifying  ratios 


Method 

0 

4710 

8389 

41329 

41668 

Algorithm  5 

7.81 

7.81 

29.46 

29.46 

9.11 

Algorithm  0 

5.76 

4.26 

29.46 

4.26 

5.93 

Algorithm  1 

5.78 

6.47 

29.46 

9.77 

16.57 

Algorithm  2 

5.76 

7.81 

12.13 

15.03 

6.75 

Algorithm  3 

5.78 

7.81 

12.13 

29.46 

9.11 

Algorithm  4 

6.75 

7.81 

6.66 

15.03 

6.75 

Algorithm  6 

5.78 

7.04 

16.57 

18.11 

29.46 

Algorithm  7 

5.62 

7.04 

11.66 

5.51 

8.79 

Algorithm  8 

4.84 

4.57 

12.13 

6.66 

12.13 

Table  3  Comparison  of  average  performance 


Method 

Total  #  Molecules 

Total  Efficiency 

Average  Efficiency 

Algorithm  5 

602 

0.00046585 

0.00123043 

Algorithm  0 

404 

0.000694163 

0.000891896 

Algorithm  1 

498 

0.000563136 

0.001196011 

Algorithm  2 

552 

0.000243202 

0.000742978 

Algorithm  3 

505 

0.000555331 

0.00111111 

Algorithm  4 

563 

0.000238451 

0.000654458 

Algorithm  6 

374 

0.000749845 

0.00189321 

Algorithm  7 

347 

0.000322825 

0.000968158 

Algorithm  8 

418 

0.000275404 

0.000809976 
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5.  Conclusions 


We  have  developed  and  compared  efficient  algorithms  for  handling  constraints  in 
the  optimization  of  substitutional  subspaces  of  chemical  compound  space.  The 
best  candidate  was  found  to  be  2  (see  Fig.  3).  Algorithm  6  shows  the  most  promise 
for  finding  good  candidates  with  minimal  effort  both  for  single  runs  as  well  as 
aggregate  samples.  Reordering  substitutions  improves  expected  performance  for 
single  runs  as  both  Algorithm  1  and  Algorithm  3  are  very  competitive  with 
Algorithm  6.  Since  Algorithm  6  differs  from  Algorithm  3  merely  in  solving  the 
subproblem,  there  is  no  additional  computational  cost  associated.  Algorithm  4 
shows  the  worst  performance  both  in  aggregate  as  on  average  due  to  relegating 
entire  subspaces  maximally  far  from  the  current  iterate. 

The  Algorithm  7  family  of  algorithms  performs  surprisingly  well  on  average,  even 
though  it  does  not  utilize  the  underlying  structure  of  the  search  space.  On  the  other 
hand,  aggregate  efficiency  is  poor  for  the  Algorithm  7  family. 

Since  Algorithm  7  consistently  computes  the  fewest  compounds  before 
converging,  it  can  be  recommended  when  only  a  quick  assessment  is  required.  In 
all  other  cases,  Algorithm  6  should  be  considered  superior. 
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Appendix.  Listings 
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Listing  1  energy _run.sh  runs  the  constrained  geometry  preoptimization  and  the  follow-up  full 
optimization.  It  returns  the  final  coordinates. 

# ! / bin /bash 

#  Execution  script 

#  remove  extra  files 
EXEC=$PWD /  g  0  9  _r  u  n 

filename  =  ‘ basename  ${1}  .  dat  ‘ 

mkdir  — p  $1 

if  [  !  — e  "$l/energy"  —a  !  — e  "$l/failed"  ];  then 

cd  $1 

echo  %chk=opt  .  chk  >  pre.com 
cat  ../ header  .  com  »  pre.com 
cat  .  ./$l.zmat  »  pre.com 
echo  %chk=opt  .  chk  >  opt.com 
cat  ../ footer  .  com  »  opt.com 
if  [  !  — e  "pre.log"  ];  then 

NORMAI EXEC=  ‘tail  pre.log  I  grep  — o  termination  ‘ 
if  [  !  -z  "$NORMALEXEC"  -o  !  -e  "opt.  chk"  ];  then 
$EXEC  pre.com  pre.log 
fi 
fi 

$EXEC  opt.com  opt  .  g09_out 

NORMAI EXET=  ‘tail  opt  .  g09_out  I  grep  — o  Normal  I  awk  ’{  print  \ 

$1  }  ’  ‘ 

if  [  "  $NORMALEXEC "  ==  "Normal"  ];  then 

fgrep  opt  .  g09_out  — e  ’SCF  Done’  I  tail  —1  I  awk  ’{  print  \ 
$5 } ’  >  energy 

awk  — f  .  .  /  logcart  .awk  <  opt  .  g09_out  >  xyz 

python  -/ bin  /  retrieve_zmat_from_xyz  .  py  ../Sl.zmat  xyz  \ 

I  grep  c  >  opt.rconsts 

python  -/ bin  /  retrieve_zmat_from_xyz  .  py  .  ,/$l.zmat  xyz  \ 

I  grep  d  >  opt . rvars 
.  .  /  retrieve_zmat_g03  opt  .  g09_out  opt 

This  appendix  appears  in  its  original  form,  without  editorial  changes. 
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bzip2  opt  .  g09_out 
bzip2  *.chk 

cd 

In  — s  $l/energy  $1 . energy 
In  — s  $l/opt.rvars  Sl.rvars 
In  — s  $1  /  opt  .  rconsts  $1  .  rconsts 

else 

echo  —1000000  >  result 
touch  failed 
exit  1 
fi 
fi 


Listing  2  proprty_script.sh  computes  the  properties  and  penalties  of  a  molecule. 

# ! / bin /bash 

#  Execution  script 

#  geometry  optimize  in  Gaussian  and  feed  to  CNDO 

#  run  cndo 

#  read  strongest  excitation 

#  read  beta 

#  remove  extra  files 

EXEC=/apps  /gaussian/ scripts  /  g091_run_d01 
filename  = ‘basename  ${1}  .  dat  ‘ 

mkdir  — p  $1 

if  [  !  -e  "$1/$1 . failed "  -a  -e  " $1 / $1 . xyz "  ];  then 

cd  $1 

#  Compute  Hyperpolarizability 

if  [  !  — e  "$1. result"  ];  then 

#  CNDO 

.  .  /  xyz_to_CNDO  $l.xyz  $1  .  dat  0  1 
In  — s  .  .  /  SOS_input .  txt  . 

In  -s  .  ./INDOIS.  par  . 

echo  $1  I  cndo  >&  cndo_out 

SOSx 

.  .  /  clean_cndo 
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mv  sos*  Sl.sos 

fgrep  $1  .  sos  — e  ’beta  mu’  I  awk  ’{print  $3}’  >  $1. result 
BLA=‘cat  $1. result  ‘ 

.  .  /  f a b s  $BLA  >  $1.  result 
bzip2  $1  .  log 

ed 

In  — s  $1  /$1 . re  s  ult  . 
cd  $1 
fi 

if  [  !  — e  "$1.  penalty"  ];  then 

bunzip2  $l.chk.bz2 

#  Compute  penalty  of  spectrum 

echo  %chk=$l.chk  >  $  l_spectrum  .  com 
cat  ../ spectrum  .  com  »  $  l_spectrum  .  com 
$EXEC  $  1  _spectrum  .  com  Sl_spectrum.log  32 
bzip2  $1  .  chk 

STATEG09=  ‘  fgrep  $  l_spectrum  .  log  — e  Normal  I  awk  \ 

’{print  $1  }  ’  ‘ 

if  [  "$STATEG09"  ==  "Normal"  ];  then 

fgrep  $  l_spectrum  .  log  — e  "  Excited^  S  tate  I  \ 
awk  ’ { print  $7 } ’  >  $1 .nm 

fgrep  $  l_spectrum  .  log  — e  "  Excited^  S  tate I  \ 
awk  ’{print  $9}’  I  awk  -F  =  ’{print  $2}’  >$l.osc 

.  .  /  integrate_windo  w_penalty  ${l}.nm  Sl.osc  \ 

$  {  1  } .  penalty 

BLA=‘cat  $1.  penalty  ‘ 

.  ./fabs  $BLA  >  $1.  penalty 

rm  $1 . chk . bz2 
bzip2  *log 

else 

echo  100000  >  $1  .  penalty 
touch  $1  .  failed 

exit  1 
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fi 

cd 

In  — s  $1  / $1  .  penalty  . 
exit  0 
fi 
fi 

Listing  3  Script  that  sets  up  the  right  links  for  the  optimization  after  a  good  initial  structure 
has  been  identified. 

# / / bin /bash 

n=$l 
echo  $n 

In  — s  $1  $2 

for  i  in  ‘Is  $  { n  } .  *  I  awk  -F  "${n}"  ’{print  $2  }  ’  ‘ ;  do 

#  echo  $ i 

if  [  !  -e  " $2$i "  ];  then 

In  — s  $  1  $  i  $  2  $  i 

fi 

done 

n=$l 
cd  $1 

for  i  in  ‘Is  ${n}*chk*  I  awk  -F  "$n"  ’{print  $2  }  ’  ‘ ;  do 

cp  $  1  $  i  $  2  $  i 

done 

for  i  in  ‘Is  ${n}*.*  I  awk  -F  "$n"  ’{print  $2 }  ’  ‘ ;  do 

#  echo  $i 

if  [  !  -e  "  $2$i "  ];  then 

In  — s  $  1  $  i  $  2  $  i 

fi 

done 

cd 

Listing  4  Input  file  to  determine  the  substitutional  search  space. 

ChemGroup  ( ( 

Z(  #  Framework 
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) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 


(N 

3  . 

0 

.000000 

(C 

,  o  . 

1 

.455198 

(C 

,  o  . 

1 

.455202 

(C 

,  0  . 

1 

.382367 

(H 

,  1  . 

1 

.090564 

(H 

,  1  . 

1 

.100788 

(H 

,  1  . 

1 

.096133 

(H 

,  2  . 

1 

.090564 

(H 

,  2  . 

1 

.100785 

(H 

,  2  . 

1 

.096133 

(C 

,  3  . 

1 

.418413 

(C 

,10  . 

1 

.387462 

(C 

,11  ■ 

1 

.410705 

(C 

,12  . 

1 

.410706 

(C 

,13  . 

1 

.387460 

(C 

,12  . 

4 

.060996 

(C 

,15  . 

1 

.413713 

(C 

,16  . 

1 

.387454 

-2  ,  0.000000 

-3  ,  0.000000 

1  ,1  18.635000 
1  ,1  19.869000 
0  ,109.068000 
0  ,1  12.488000 
0  ,11  1.145000 
0  ,109.068000 
0  ,1  12.487000 
0  ,11  1.145000 
0  ,121.400000 
3  ,121.166000 

10  ,121.553000 

11  ,1  17.355000 

12  ,121.552000 

11  ,121.324000 

12  ,120.840000 

15  ,120.846000 


,-l  , 

0.000000 

,  -2, 

0.000000 

,— 3  , 

0.000000 

,  2  , 

165.442000 

,  3  , 

-177.996000 

,  3  , 

61.924000 

,  3  , 

-59.556000 

,  3  , 

178.005000 

,  3  , 

-61.916000 

,  3  , 

59.565000 

,  2  , 

172.824000 

,  o  , 

178.587000 

,  3  , 

0.387000 

,10  , 

0.288000 

,11  , 

-0.288000 

,10  , 

-179.837000 

,11  , 

0.091000 

,12  , 

179.824000 

) 
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) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 


(C 

,17  , 

1.412401 

(C 

,18  , 

1.412399 

(C 

,15  , 

1.41371 1 

(C 

,18  , 

4.061134 

(C 

,21  , 

1.413807 

(C 

,22  , 

1.388815 

(C 

,23  , 

1.396983 

(C 

,24  , 

1.396982 

(C 

,21  , 

1.413807 

(N 

,24  , 

1.467005 

(0 

,27  , 

1.233931 

(0 

,27  , 

1.233930 

(H 

,23  , 

1.082963 

(H 

,25  , 

1.082964 

(H 

,10  , 

1.082697 

(H 

,14  , 

1.082697 

(C 

,12  , 

1.420584 

(C 

,15  , 

1.420221 

16  ,120.748000 

17  ,1  18.500000 

16  ,1  18.313000 

17  ,120.750000 

18  ,120.572000 

21  ,120.703000 

22  ,1  18.985000 

23  ,121.768000 

22  ,1  18.856000 

23  ,119.1  16000 

24  ,1  17.856000 
24  ,1  17.855000 
22  ,121.410000 
24  ,119.605000 
11  ,118.420000 
13  ,118.420000 
11  ,121.322000 
16  ,120.843000 


,15  ,  0.004000 

,16  ,  0.022000 
,17  ,  -0.030000 

,16  ,-179.910000 
,17  ,  -0.003000 

,18  ,  179.943000 
,21  ,  0.002000 
,22  ,  0.002000 
,23  ,  -0.006000 

,22  ,-179.996000 
,23  ,  0.008000 

,23  ,-179.992000 
,21  ,-179.995000 
,23  ,  180.000000 
,12  ,-179.648000 
,12  ,  179.647000 
,10  ,-179.945000 
,17  ,  179.930000 


) 
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(C 

,18  , 

,  1.421365 

,  17 

,120.750000 

,16 

,-179.945000 

(C 

,21  , 

,  1.420438 

,  22 

,120.572000 

,23 

,  179.981000 

) 

) 

ReturnConnectorQ 
Connector  ( 

( 

(16,15,35) 

(0.7,  0,  0) 
(0,120,0) 
(0,0,0) 

(  0,  0,  1) 

(  0,  0,  1) 

(  0,  0,  0) 

0 

) 

( 

(20 ,15  ,35) 
(0.7,  0,0) 
(0,120,0) 
(0,0,0) 

(  0,  0,  1) 

(  0,  0,  1) 

(  0,  0,  0) 

0 

) 

( 

(13,12,11) 
(0.7,  0,0) 
(0,120,0) 

(0  ,0 ,180) 

(  0,  0,  1) 

(  0,  0,  1) 

(  0,  0,  0) 

0 
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) 

( 

(11,12,13) 
(0.7,  0,0) 
(0,120,0) 
(0  ,0 ,180) 

(  0,  0,  1) 

(  0,  0,  1) 

(  0,  0,  0) 
0 

) 

( 

(17,16,15) 
(0.7,  0,0) 
(0,120,0) 
(0  ,0 ,180) 

(  0,  0,  1) 

(  0,  0,  1) 

(  0,  0,  0) 

0 

) 

( 

(19,18,17) 
(0.7,  0,0) 
(0,120,0) 
(0  ,0 ,180) 

(  0,  0,  1) 

(  0,  0,  1) 

(  0,  0,  0) 

0 

) 

( 

(22  ,21  ,37) 
(0.7,  0,0) 
(0,120,0) 
(0,0,0) 
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(  0,  0,  1) 

(  0,  0,  1) 

(  0,  0,  0) 

0 

) 

( 

(26,21  ,37) 

(0.7,  0,0) 

(0,120,0) 

(0,0,0) 

(  0,  0,  1) 

(  0,  0,  1) 

(  0,  0,  0) 

0 

) 

) 

alio wed_groups  ( 

(1,2,3  ,4)  #  Subs  ,  H,  F,  Cl  ,  Br 
(1,2, 3, 4) 

(1,2, 3, 4) 

(1,2, 3, 4) 

(1,2, 3, 4) 

(1,2, 3, 4) 

(1,2, 3, 4) 

(1,2, 3, 4) 

) 


(#1 

Z( 

(H,  -3,  0.45  ,  -2,  0,  -1,  0) 

) 

ReturnConnector() 

Connector  () 
allowed_groups  ( ) 

) 
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(#2 

Z( 

(F,  -3,  0.6,  -2,  0,  -1,  0) 

) 

ReturnConnectorQ 
Connector  () 
allowed_groups  ( ) 

) 

(#3 

Z( 

(Cl ,  -3,  0.8  ,  -2,  0,  -1,  0) 

) 

ReturnConnectorQ 
Connector  () 
allowed_groups  ( ) 

) 

(#4 

Z( 

(Br,  -3,  1.2,  -2,  0,  -1,  0) 

) 

ReturnConnectorQ 
Connector  () 
allowed_groups  ( ) 

) 

) 

nconstraints  (1) 

Listing  5  integrate_window_penalty.cc  Computes  the  penalty  function  for  absorbing  in  the 
visible  electromagnetic  range. 

#include  <fstream> 

#include  <iostream> 

#include  <sstream> 

#include  <string> 

#include  <cmath> 

# define  MAX_STEP  20 
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using  namespace  std  ; 


double  tolerance=le  — 14; 

double  num_int  ( double  x,  double  y  ,  double  alpha) 

{ 

double  d=y— x ; 
double  dx=d/100000.0; 
double  r=0.0; 

for(long  i  =0;  i  <100001 ; i ++,x+=dx) 

r +=dx*exp ( —  (1.0  —  x)*(  1 .0  —  x)/(x*x)*alpha*alpha); 

return  r  ; 

} 

double  diff_int_Gaussian  (  double  x  ,  double  y) 

{ 

long  i  ; 

double  z  =  l; 

double  r=0.0; 

const  double  x2=x*x; 

const  double  x4=x2*x2; 

const  double  y2=y*y; 

const  double  y4=y2*y2; 

double  x4i=x; 

double  y4i=y; 

double  rl  =0.0; 

double  r2=0.0; 

for(i=0;i  <MAX_STEP  &&  z>tolerance  ;  i  ++) 

{ 

z  =  1 .0/(double)  (4*  i  +  I)*(y4i— x4i)  — 

( y2*y4i—  x2* x4i  )/(double)  ((4*i+3)*(2*i+l)); 
r  1 +=  1 .0/(  double  )  (4*i+l)*(x4i)  — 

( x2*  x4i  )  / (double)  ((4*  i+3)*(2*  i  +  1)); 
r2 +=  1 .0/(  double  )  (4*i+l)*(y4i)  — 

( y2*  y4i  )  / (double)  ((4*  i  +  3)*(2*  i  +  1)); 
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r +=z  ; 

x4i  *=x4  /  (double)  ((2*  i  +  l)*(2*i  +  2)); 
y4i*=y4/(  double)  ((2*  i  +  l)*(2*  i  +  2)); 

} 

return  r2— rl  ; 

} 

int  main(int  argc  ,  char  *argv[]) 

{ 

if  (argc  <3) 

cerr  «  "Usage «  argv  [0] 

«  "i_,<nmfile  >^<0  scfile  >i_,<NLO_optresultfile  >\n"  ; 
ifstream  log  (  argv  [  1  ]) ; 
ifstream  log2  (  arg v  [  2  ] ) ; 
f stream  result  ; 

bool  done=false  ; 

string  search  ; 

int  number; 
double  eV,cm_l,nm; 
double  strength  ; 
double  penalty  =  —30.0; 
log  »  nm; 
log2  »  strength  ; 
double  alpha=sqrt  (72.0) ; 
while  ( log  .  good  ()  &&  !done)  { 

penalty +=  strength  *num_int  (400 . 0/nm, 7  00 . 0/nm,  alpha  )*nm 

log  »  nm; 

log2  »  strength  ; 

} 

if  (  penalty  <0)  penalty=0.0; 
result . open( argv [3] , ios  :  : in  ) ; 
getline  (  result  , search); 
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result  .  close  () ; 
stringstream  ss2  ; 
double  val=0.0; 
ss2  «  search  ; 
ss2  »  val  ; 
val  — =penalty  ; 

result  .  open  (  argv  [3],ios::out); 
result  «  fabs(val)  «  endl  ; 
result  .  flush  (); 
result . close  () ; 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


CCS 

chemical  compound  space 

EO 

electro-optic 

LCAP 

linear  combination  of  atomic  potentials 

NLO 

nonlinear  optical 

nm 

the  official  abbreviation  for  nanometer  from  the  International  System 

of  Units 

TD-DFT 

time-dependent  density-functional  theory 

VP-DFT 

variation-of-particles  density-functional-theory 
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